Short text classification applied to item description: Some methods evaluation

نویسندگان

چکیده

The increasing demand for information classification based on content in the age of social media and e-commerce has led to need automated product using their descriptions. This study aims evaluate various techniques this task, with a focus descriptions written Portuguese. A pipeline is implemented preprocess data, including lowercasing, accent removal, unigram tokenization. bag words method then used convert text into numerical five are applied: argmaxtf, argmaxtfnorm, argmaxtfidf from retrieval, two machine learning methods logistic regression support vector machines. performance each technique evaluated simple accuracy via thirty-fold cross validation. results show that achieves highest mean among techniques.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An evaluation of text classification methods for literary study

This article presents an empirical evaluation of text classification methods in literary domain. This study compared the performance of two popular algorithms, naı̈ve Bayes and support vector machines (SVMs) in two literary text classification tasks: the eroticism classification of Dickinson’s poems and the sentimentalism classification of chapters in early American novels. The algorithms were a...

متن کامل

Evaluating Text Clustering Methods for Text Classification

In this project report, I will evaluate the several text clustering approaches and how they can be used for the purpose of text classification. The particular task is topic classification of 20 Newsgroup dataset and sentiment classification restaurant reviews dataset. Future direction for improving the results will also be discussed.

متن کامل

A Redundant Covering Algorithm Applied to Text Classification

Covering algorithms for learning rule sets tend toward learning concise rule sets based on the training data. This bias may not be appropriate in the domain of text classification due to the large number of informative features these domains typically contain. We present a basic covering algorithm, DAIRY, that learns unordered rule sets, and present two extensions that encourage the rule learne...

متن کامل

Text Classification with Tournament Methods

This paper compares the effectiveness of n-way (n > 2) classification using a probabilistic classifier to the use of multiple binary probabilistic classifiers. We describe the use of binary classifiers in both Round Robin and Elimination tournaments, and compare both tournament methods and n-way classification when determining the language of origin of speakers (both native and non-native Engli...

متن کامل

Transductive LSI for Short Text Classification Problems

This paper presents work that uses Transductive Latent Semantic Indexing (LSI) for text classification. In addition to relying on labeled training data, we improve classification accuracy by incorporating the set of test examples in the classification process. Rather than performing LSI’s singular value decomposition (SVD) process solely on the training data, we instead use an expanded term-by-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Semina

سال: 2022

ISSN: ['1676-5435', '1679-0367']

DOI: https://doi.org/10.5433/1679-0375.2022v43n2p189